Applying Word Pair Model to the Chinese Syllable-to-Word Problem

نویسنده

  • Jia-Lin Tsai
چکیده

Syllable-to-word (STW) conversion is a main task of Chinese Language Processing and a fundamental to syllable/speech understanding. The two major problems of STW conversion are syllable-word segmentation and homophone selection. This paper presents a word pair model (WPM) that can effectively perform homophone selection and syllable-word segmentation to improve Chinese input systems. The STW experimental results show that: (1) the WPM is able to achieve tonal (syllables input with four tones) and toneless (syllables input without four tones) STW accuracies of 99% and 93%, respectively, among the converted words; (2) the WPM is able to cover 97% (in average) tonal and toneless STW conversions of poly-syllabic words for the testing syllables; and (3) while applying the WPM as an adaptation processing, together with the Microsoft Input Method Editor 2003 (MSIME) and an optimized bigram model (BiGram), the average tonal and toneless STW improvements are 33% and 31%, respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Applying a Mix Word-Pair Identifier to the Chinese Syllable-to-Word Conversion Problem

This paper describes a mix word-pair mix-WP) identifier to resolve homonym/segmentation ambiguities as well as perform STW conversion effectively for Chinese input. The mix-WP identifier includes a specific word-pair (SWP) identifier and a common wordpair (CWP) identifier. It is designed as a supporting processing with Chinese input systems. Our experiments show that by applying the mix-WP iden...

متن کامل

Applying Meaningful Word-Pair Identifier to the Chinese Syllable-to-Word Conversion Problem

Syllable-to-word (STW) conversion is a frequently used Chinese input method that is fundamental to syllable/speech understanding. The two major problems with STW conversion are the segmentation of syllable input and the ambiguities caused by homonyms. This paper describes a meaningful word-pair (MWP) identifier that can be used to resolve homonym/segmentation ambiguities and perform STW convers...

متن کامل

Using Word-Pair Identifier to Improve Chinese Input System

This paper presents a word-pair (WP) identifier that can be used to resolve homonym/segmentation ambiguities and perform syllable-to-word (STW) conversion effectively for improving Chinese input systems. The experiment results show the following: (1) the WP identifier is able to achieve tonal (syllables with four tones) and toneless (syllables without four tones) STW accuracies of 98.5% and 90....

متن کامل

Applying an NVEF Word-Pair Identifier to the Chinese Syllable-to-Word Conversion Problem

Syllable-to-word (STW) conversion is important in Chinese phonetic input methods and speech recognition. There are two major problems in the STW conversion: (1) resolving the ambiguity caused by homonyms; (2) determining the word segmentation. This paper describes a noun-verb event-frame (NVEF) word identifier that can be used to solve these problems effectively. Our approach includes (a) an NV...

متن کامل

Word segmentation in Persian continuous speech using F0 contour

Word segmentation in continuous speech is a complex cognitive process. Previous research on spoken word segmentation has revealed that in fixed-stress languages, listeners use acoustic cues to stress to de-segment speech into words. It has been further assumed that stress in non-final or non-initial position hinders the demarcative function of this prosodic factor. In Persian, stress is retract...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006